Risk prediction of advanced colorectal neoplasia among diabetic patients: A derivation and validation study

Abstract Background and Aim Colorectal cancer (CRC) is the third most common cancer in the world. This study devises and validates a clinical scoring system for risk prediction of advanced colorectal neoplasia (ACN) to guide colonoscopy evaluation among diabetic patients. Methods We identified 55 964 diabetic patients who received colonoscopies from a large database in a Chinese population (2008–2018). We recruited a derivation cohort based on random sampling. The risk factors of CRC evaluated by univariate analysis were examined for ACN, defined as advanced adenoma, CRC, or any combination thereof using binary logistic regression analysis. We used the adjusted odds ratios (aORs) for independent risk factors to devise a risk score, ranging from 0 to 6: 0–4 “average risk” (AR) and 5–6 “high risk” (HR). The other subjects acted as an independent validation cohort. Results The prevalence of ACN in both the derivation and validation cohorts was 2.0%. Using the scoring system constructed, 78.5% and 21.5% of patients in the validation cohort were classified as AR and HR, respectively. The prevalence of ACN in the AR and HR groups was 1.5% and 4.1%, respectively. Individuals in the HR group had a 2.78‐fold increased prevalence of ACN than the AR group. The concordance (c‐) statistics was 0.70, implying a good discriminatory capability of the risk score to stratify high‐risk individuals who should consider colonoscopy. Conclusion The clinical risk scoring system based on age, gender, smoking, presence of hypertension, and use of aspirin is useful for ACN risk prediction among diabetic patients.


Introduction
Colorectal cancer (CRC) is the third most commonly diagnosed cancer. 1 In 2019, there were more than 1.9 million new CRC cases worldwide. 1It ranked as the second leading cause of cancer mortality, accounting for almost 1 million deaths globally in 2019. 1 It is projected that CRC incidence and mortality will continue to increase and impose a heavy public health burden. 2 A substantial body of evidence shows that early colonoscopy screening can ameliorate its associated morbidity and mortality by detection and removal of premalignant lesions. 3However, colonoscopy resources are limited and their adherence is often suboptimal when compared with other screening tools. 4isk prediction scores for advanced colorectal neoplasia (ACN) could therefore prioritize high-risk subjects for colonoscopy screening. 5e global prevalence of diabetes mellitus had recorded an almost fourfold increase from 108 million in 1980 to 422 million in 2014, and it was expected to rise to almost 600 million by 2035. 6Diabetes mellitus can lead to multisystem complications, and studies found that diabetes mellitus was associated with a higher risk of CRC. 7 A meta-analysis of 15 studies involving 2.5 million patients demonstrated a significantly increased relative risk (RR) (by 30%) of developing CRC in diabetes mellitus patients when compared with their healthy counterparts. 8Nevertheless, there is a scarcity of risk algorithms to predict ACN among diabetic patients.The proportion of the older population with diabetes mellitus was around 20%, 7 highlighting the need for developing a risk stratification tool for this high-risk group requiring intensive clinical attention.This study aimed to develop and validate a clinical risk stratification score for ACN prediction among diabetic patients.A simple tool for clinicians based on easy-to-collect information for risk stratification could assist in the identification of diabetes mellitus subjects at higher risk for ACN.It will also inform the starting age and frequency for CRC screening among diabetes mellitus patients, thereby informing the allocation of colonoscopy resources to high-risk individuals.

Methods
Settings.The present study was performed in accordance with the Declaration of Helsinki and submitted to the Survey and Behavioral Research Ethics Committee of the Chinese University of Hong Kong for ethics approval.This was a population-based retrospective cohort study with baseline recruitment between 1 January 2008 and 31 December 2018.In this study, data were extracted from Hospital Authority Data Collaboration Lab (HADCL), which is a platform providing access to an electronic healthcare database that consists of patient demographic data, clinical diagnoses, procedures, drug prescriptions, and laboratory results from all public hospitals and clinics in Hong Kong.It represents both in-patient and out-patient data of about 80% of the 7.49 million people in our locality.We have previously validated the database and reported a high level of completeness of patients' demographic profiles (100%) and prescription details (99.8%). 9[12] Study subjects.The medical condition was ascertained from the electronic healthcare database in this study.Inclusion criteria were all Chinese diabetes mellitus patients with gastrointestinal (GI) symptoms aged 18 years or older who have received at least one colonoscopy, managed in General Outpatient Clinics (GOPC) of the Hong Kong Government.Sociodemographic data including age, sex, body mass index (BMI), smoking, drinking, age at diabetes mellitus diagnosis, duration of diabetes mellitus, concomitant medical conditions, use of medications, age at first colonoscopy, histopathology findings of removed polyps, results from physical examination, relevant laboratory investigations, and drug prescriptions were collected.The duration of diabetes mellitus refers to the time since a diabetes mellitus diagnosis was made before a baseline diabetes mellitus complication screening a patient received.The age at diabetes mellitus diagnosis refers to the time difference between the self-reported year of diabetes mellitus diagnosis during a baseline assessment and the year of birth, and the age of first colonoscopy refers to age of the patient when they received a first colonoscopy in any public hospitals.All colonoscopies in this proposed project were conducted at accredited endoscopy centers affiliated with public hospitals.The procedures were performed by experienced colonoscopy practitioners with informed consent of the screening participants.At all centers, most bowel preparation for each participant was accomplished using split-dose polyethylene glycol.During all colonoscopy examinations, standard air insufflation was carried out.The bowel preparation regimens used at all centers were recommended by international guidelines to enhance high-quality procedural standards.A withdrawal time of ≥6 min was targeted in each procedure to comply with the current colonoscopy quality indicators, as provided by the database.The practitioners identified and removed all lesions suitable for polypectomy, which were sent to an accredited pathology center for detailed histopathological examination.We identified patients with ACN, defined as CRC, or any colorectal adenoma with a size of ≥10 mm in diameter, high-grade dysplasia, villous or tubulovillous histologic characteristics, or any combination of them.Patients without these codes were considered as control subjects.For subjects who received more than one colonoscopy, the first colonoscopy was considered in the analysis.
Development of the risk scores.The process of development of the risk scores is presented in Figure 1.The associations between risk factors and the colonoscopic finding of ACN were examined by Pearson chi-square tests in the derivation cohort.Risk factors examined included age, sex, BMI, smoking (current/ex-smoker vs non-smoker), alcohol drinking (current, and social drinkers vs non-drinkers), age at diabetes mellitus diagnosis, duration of diabetes mellitus, age at first colonoscopy, self-reported medical conditions (including hypertension, ischemic heart disease, stroke, and cirrhosis), and use of medications as documented in the computerized system (including aspirin, metformin, and insulin).Variables with P value <0.05 in the univariate analysis were included in a binary logistic regression model, with ACN being the outcome.Each of these risk factors was then assigned a weighting in the risk score using the respective adjusted odds ratios (aOR) halved and rounded to the nearest integer to keep the simplicity.The summation of all the risk factors was the risk score for each individual.We constructed a receiver operating characteristic (ROC) curve and used the area under the curve (AUC) to examine the discriminatory capability of the score.

Statistical analysis
Data were entered and analyzed using R version 3.5.2.The prevalence of ACN according to each score in the derivation cohort was evaluated.Scores above the overall prevalence were assigned to the "high risk" (HR) category, while those with magnitude closest to and below the overall prevalence were categorized as "average risk" (AR).Another separate binary logistic regression model was constructed using the validation cohort with the significant risk factors identified by the derivation cohort for evaluation of the aOR of each risk factor.The aORs were compared between the two cohorts.The Hosmer-Lemeshow goodness-of-fit test statistic was adopted for the reliability assessment of the final model, with P value >0.05 indicating a good match of predicted risk over observed risk.The ability of the scoring system to predict the risk of developing ACN was evaluated using the c-statistics and the AUC of the ROC curve, with a P values <0.05 considered as statistically significant.To evaluate the resources required if the scoring system is implemented to refer subjects assigned as HR for polypectomy by colonoscopy, we computed the number needed to screen (NNS) to detect one ACN.It is defined as the inverse of the predicted outcome probability from the regression model.

Results
Participant characteristics.A total of 55 964 diabetes mellitus patients who received colonoscopy from a large database were included, consisting of 39 175 subjects in the derivation cohort and 16 789 subjects in the validation cohort (Table 1).The characteristics of the derivation cohort were similar to those of the validation cohort in terms of age at diabetes mellitus diagnosis, age at first colonoscopy, BMI, male proportion, smoking history, alcohol consumption, duration of diabetes mellitus, the prevalence of hypertension, ischemic heart disease, stroke, cirrhosis, use of aspirin, metformin, and insulin (all P values >0.05).The cohort of the subset refers to subjects who received a baseline diabetes mellitus complication screening between the years 2013 and 2018; the characteristics of the derivation cohort were also similar to those of the validation cohort, except for a slight difference in the prevalence of hypertension between the derivation cohort and validation cohort (66.6% vs 65.1% P = 0.037).
The prevalence of ACN in the derivation cohort according to risk factors is shown in Table 2.The overall prevalence of ACN was 2.0% (n = 325), with a higher rate in subjects who were males (2.4%), older at diabetes mellitus diagnosis (>70 years old: 3.2%), older at first colonoscopy (>60 years old: 3.1%), current or past smokers (2.6%), and hypertensive (2.3%).
Development of the risk score.From the AORs of the derivation cohort, the following risk factors were utilized to assign scores to the participants (Table 4): age at first colonoscopy below 60 years old (0), 60 years old or above (2); female gender (0), male gender (1); non-smoker (0), current/ex-smoker (1); no hypertension (0), hypertension (1); and use of aspirin (0), no use of aspirin (1).The score of each participant ranges from 0 to 6, and the total score of each participant was the sum of all points allocated to each risk factor.The proportion of subjects having different scores is shown in Table 5.The proportion of participants with scores 0-6 was 1.0%, 10.3%, 18.0%, 24.0%, 24.8%, 15.2%, and 6.6%, respectively.The prevalence of ACN in participants with a score of 0-6 was 0.0%, 0.4%, 0.7%, 1.4%, 2.3%, 4.0%, and 4.7%, respectively.The prevalence of ACN of score 4 is the closest to the overall prevalence in the derivation cohort (2.3% vs 2.0%).Therefore, scores of 0-4 were classified as "AR," while scores of 5-6 were designated as "HR" because they had a higher prevalence of ACN than that of all study participants.Using this stratification method, 78.2% of patients in the derivation cohort had average risk, and the remaining 21.8% of patients had high risk.
Validity and reliability of the model.Out of the 325 ACN detected in the derivation cohort, 175 were categories into AR and 150 into HR (Table 6).The prevalence of ACN was higher in the HR group (4.2%) than in the AR group (1.4%).From the validation cohort, 78.5% and 21.5% were in the AR and HR category, respectively, with proportions similar to the subjects in the derivation cohort.For the 139 cases in the validation cohort, 79 and 60 patients were categorized as AR and HR, respectively.The prevalence of ACN in the AR and HR tiers was 1.5% (95% CI 1.15-1.80%)and 4.1% (95% CI 3.08-5.12%),respectively.Compared with participants in the AR group, subjects in the HR group had a statistically significantly higher risk of ACN (RR 2.78, 95% CI 2.00-3.86%,P = 0.005).The NNS was 24 and 67 for HR and AR groups, respectively.

Discussion
This study devised and verified ACN risk prediction using patients' age, gender, smoking habits, the presence of hypertension, and the use of aspirin as scoring criteria and evaluation items.We found that the scoring system could successfully stratify diabetes mellitus patients into average risk (1.5%) versus the high risk of harboring ACN (3.6%) with a good discriminatory capability (c-statistics 0.70).We used sophisticated multivariate regression analysis to select the model (N 0 À N 4 ) with the best goodness of fit and therefore the highest reliability.When the scoring system is applied to diabetes mellitus patients, the NNS was also small for preventing adverse cases from occurring, suggesting that its implementation could lead to efficient use of colonoscopy.A recent systematic review and meta-analysis have been performed to summarize available evidence on risk scores for the prediction of ACN among populations undergoing colonoscopy. 13They have identified 22 studies consisting of 17 original risk scores.The median number of predictor variables was five, and the most commonly included predictors include age, sex, family history of CRC in first-degree relatives, BMI, and smoking habits.The c-statistics, or AUC in binary logistic regression analysis, ranged from 0.62 to 0.77 in individual studies and 0.61 to 0.70 in the pooled analysis.The present risk score contains five predictors with relatively high c-statistics, allowing an accurate prediction of ACN.To our knowledge, there has been only one study that developed a risk score for the prediction of ACN for diabetes mellitus patients. 14The algorithm generated seven predictors, including age, gender, BMI, age of diabetes mellitus onset, the use of antidiabetic treatments, HbA1c, and C-reactive protein.Nonetheless, that study involved a single center with a relatively modest sample size.Also, the performance of that scoring algorithm has not yet been evaluated, and no studies have presented a validated score that could be directly applied in clinical practice for diabetes mellitus patients.Otherwise, previous algorithms focused on average-risk general populations.Development of such a risk scoring system for diabetes mellitus patients could therefore address an important knowledge gap, as diabetes mellitus has been well recognized as a significant risk factor for ACN.Previous literature showed that diabetes mellitus patients had a higher risk for CRC than their healthy counterparts in China (OR = 4.97). 15Endogenous hyperinsulinemia has been speculated as one possible mechanism driving the higher risk of CRC among type 2 diabetes mellitus patients. 16In addition, insulin could lead to cell proliferation and reduce apoptosis by enhancing the bioavailability of Insulin-like Growth Factor-1 (IGF-1). 179][20] High serum levels of insulin have been independently related to an increased risk of CRC. 21,22Therefore, in addition to common risk factors, our study also considered other possible risk factors that are related to diabetes mellitus for ACN, such as age at diabetes mellitus diagnosis, duration of diabetes mellitus, and concomitant medical conditions, which focus more on patients with diabetes mellitus and hope to build a more comprehensive ACN prediction model for patients with diabetes mellitus.
This clinical risk score is easy to use by primary care practitioners including clinicians, nurse educators, allied health professionals, and patients who are considering receiving colonoscopy for early detection of ACN.The data needed to assess ACN risk in this score are user-friendly in both clinical and community settings, as all of this elementary information can be collected by patient self-reports.The contribution of this risk algorithm to the current literature is unique, as it could evaluate the risks of ACN for individual patients, which provides an objective basis for their decision to receive colonoscopy.For instance, diabetes mellitus patients assigned as having a high risk for ACN could choose an earlier colonoscopy, while those at average risk might choose surveillance or noninvasive tests that are less sensitive to ACN, such as fecal immunochemical tests (FITs).This prediction algorithm may improve colonoscopy yield and optimize the cost-effectiveness of colonoscopy workup.Furthermore, identification of the precursors of CRC such as ACN carries the benefits of colonoscopy by early detection of removal lesions, thus improving patients' quality of life and survival rates. 14It has previously been shown that the knowledge of one's risks for CRC could influence one's screening behavior over time. 23Therefore, the adoption of this scoring system in clinical consultations might also enhance risk communications between patients and the attending physicians, which could potentially optimize patients' awareness of their own risk of ACN and adherence to colonoscopy workups.

Strengthens and limitations
This study has collected patient information over a long period, and the large sample size represents one of its strengths.Individuals are assigned to derivation or validation cohorts using computer-generated random numbers; thus, this cohort allocation process is designed to ensure unbiased representation within each cohort group, thereby minimizing the risk of selection bias.In addition, both the predictors and colonoscopy outcomes are based on objective data retrieved from the computer system, which has proven to be complete and accurate in our previous evaluations. 9Nevertheless, there are several limitations which should be addressed.First, although each diabetes mellitus patient recruited in this study did not have any CRC symptoms such as rectal bleeding and change of bowel motion, these complaints might not have been coded in the computer system, as they could have been written in the free text consultation notes in the computerized system.However, we have previously validated this dataset and found that the proportion of data completeness was up to 99.98%.In addition, we have not taken into account other recognized risk factors of CRC, such as dietary habits, 24,25 physical activity, sedentary behaviors, and waist circumference, which have been considered more accurate than BMI to predict CRC. 26 Yet collection of these parameters requires validated questionnaires, which are often time-consuming and infeasible, especially in busy clinical settings.Third, although the risk scoring system may inform the relative urgency of colonoscopy workup based on the risk of ACN, our study does not prescribe a particular period where colonoscopy should be arranged for subjects with HR and AR for ACN.Lastly, all patients in this study were ethnic Chinese, and thus the generalizability of the findings to other ethnic groups may be limited. 27,28A multinational prospective cohort study involving subjects of nine ethnicities reported an increased risk for ACN among Japanese, Korean, and Chinese relative to other ethnic groups. 29As the risk of ACN in symptomatic and asymptomatic adults differs, our results were only based on the symptomatic population and hoped to develop into a comprehensive one for asymptomatic diabetic patients.

Conclusion
In sum, we have devised a validated clinical scoring system for the prediction of ACN in a Chinese diabetes mellitus population.Its use may be associated with more efficient use of colonoscopy and contribute to early detection of ACN among highrisk diabetes mellitus patients.External validation of the score should be performed in other population groups with different characteristics.Future studies should examine its feasibility, acceptability, cost-effectiveness, and proportion of CRC reduction when implemented in clinical and community settings.Inclusion of other predictors such as polygenic risk score and other emerging risk factors of ACN could be performed in future evaluations.

Figure 1
Figure 1 Flow diagram for development of a derivation and validation cohort.

Table 1
Characteristics of participants in the derivation and validation cohorts †Time between diagnosis of diabetes and first diabetes complication screening assessment.‡ ACN is defined as colorectal cancer, or any colorectal adenoma which has a size of ≥10 mm in diameter, high-grade dysplasia, villous or tubulovillous histologic characteristics, or any combination thereof.§ Alcohol drinking included current drinker, and social drinker.BMI, body mass index; IHD, ischemic heart disease; N, number.

Table 2
Distribution of risk factors among subjects with ACN and without ACN in the derivation cohort Time between diagnosis of diabetes and first diabetes complication screening assessment.§ ACN is defined as colorectal cancer, or any colorectal adenoma which has a size of ≥10 mm in diameter, high-grade dysplasia, villous or tubulovillous histologic characteristics, or any combination thereof.BMI, body mass index; IHD, ischemic heart disease.
† Alcohol drinking included current drinker, and social drinker.‡

Table 3
Univariate and multivariate predictors of ACN in the derivation cohort §Alcohol drinking included current drinker, and social drinker.† Time between diagnosis of diabetes and first diabetes complication screening assessment.* ACN is defined as colorectal cancer, or any colorectal adenoma which has a size of ≥10 mm in diameter, high grade dysplasia, villous or tubulovillous histologic characteristics, or any combination thereof.Values in bold refer to the results with p values less than 0.05.Abbreviations: aOR, adjusted odds ratio; AUC, area under the curve; BMI, body mass index; IHD, ischaemic heart disease; OR, odds ratio.

Table 4
Colorectal Screening score for prediction of risk for ACN † ACN is defined as colorectal cancer, or any colorectal adenoma which has a size of ≥10 mm in diameter, high-grade dysplasia, villous or tubulovillous histologic characteristics, or any combination thereof.